Data Analyst Salary Estimator

This is a Data Science Project created to perform exploratory data analysis in the selected Dataset and develop a model to predict the Data Analyst salaries by analyzing different company characteristics and the Job Listing.

The created tool is capable of estimate data analyst salaries with a MSE of $14K in job listings in USA by analyzing different features.

Some Engineered features from the Job Description were extracted, such as amount of Python, Excel and AWS that Companies put on to see how valuable they were. In addition, applying a word clouds the most common words were extracted in the job description field.

The features contained in the dataset are the following:

  • Job Title
  • Salary Estimate
  • Job Description
  • Company Rating
  • Company Name
  • Location
  • Headquarters
  • Size
  • Year of Founded
  • Type of Ownership¨
  • Industry
  • Sector
  • Revenue
  • Competitors
  • Easy Apply

Some Exploratory Data Analysis (EDA)

I created a words cloud in order to visualize the most common words in the Job Description Field.

eda

Also, I plotted the number of companies in different Age Groups

eda

Model Building and Tunning

In this section, I split the data in 20 % for testing and 80 % for training, and evaluated models performance as Linear Regression, Random Forest, K Nearest Neighbors and Bagging Regression using the Negative Mean Absolute Error with Cross Validation. In this case, the model with the lowest NMAE was the Random Forest Regressor.

To tune the model I used the Grid Search CV function in order to get the best parameters. And finally, I tested the selected model with the testing set and I got 14.76 in the Mean Absolute Error.

Link to Github Repository